WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 




PCT 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification 6 
H04L 9/32 



Al 



(11) International Publication Number: WO 99/05819 

(43) International Publication Date: 4 February 1999 (04.02.99) 



(21) International Application Number: PCT/GB98/02187 

(22) International Filing Date: 22 July 1998 (22.07.98) 



(30) Priority Data: 
9715411.6 



23 July 1997 (23.07.97) 



GB 



(71) Applicant (for all designated States except US): CHANTILLEY 
CORPORATION LIMITED [GB/GB]; 28 Main Street, 
Mursley, Milton Keynes, Buckinghamshire MK17 0RT 

■ (GB). 

(72) Inventor; and 

(75) Inventor/Applicant (for US only): HAWTHORNE, William, 
McMullan [GB/GB]; Kenmare, Bramerton Road, Surling- 
ham, Norfolk NR14 7DE (GB). 

(74) Agent: GIBSON, Stewart, Harry; Urquhart-Dykes & Lord, 
Three Trinity Court, 21-27 Newport Road, Cardiff CF2 
1AA (GB). 



(81) Designated States: AL, AM, AT, AU, AZ, BA, BB, BG, BR, 
BY, CA, CH, CN, CU, CZ, DE, DK, EE, ES, FI, GB, GE, 
GH, GM. HU, ID, IL, IS, JP, KE, KG, KP, KR, KZ, LC. 
LK, LR, LS, LT, LU, LV, MD, MG, MK, MN, MW, MX, 
NO, NZ, PL, PT, RO, RU, SD, SE, SG, SI, SK, SL, TJ. TM, 
TR, TT, UA, UG, US, UZ, VN, YU, ZW, ARIPO patent 
(GH, GM, KE, LS, MW, SD, SZ, UG, ZW), Eurasian patent 
(AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), European patent 
(AT, BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, IT, 
LU, MC, NL, PT, SE), OAPI patent (BF, BJ, CF. CG, CI. 
CM, GA, GN, GW, ML, MR, NE, SN. TD, TG). 



Published 

With international search report. 

Before the expiration of the time limit for amending the 
claims and to be republished in the event of the receipt of 
amendments. 



(54) Tide: DOCUMENT OR MESSAGE SECURITY ARRANGEMENTS USING A NUMERICAL HASH FUNCTION 



18 



CD 

m 

I 

1— 
m 

o 
O 

TJ 
-< 



12 



X 



U 



ROM 



r 

20 



1 



MEMORY 



MICROPROCESSOR 



DATA 
STORE] 



10 



16 



RESULTS 
STORE 

V 

22 



28 



L 



? 4 



MODEM 



PRINTER 

\ 



26 



(57) Abstract 



A document or message is protected against forgery or repudiation by processing a selected part or parts of the text of the document 
or message to form a hash, usually of fewer characters than the selected part or parts of the text. The processing comprises retrieving 
numerical values which define the respective characters of the selected part or parts of the text and making a calculation using the numerical 
values of the successive characters. Preferably the hash is added to the text. 
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DOCUMENT OR MESSAGE SECURITY ARRANGEMENTS USING A NUMERICAL HASH FUNCTION 

The present invention relates to arrangements for the 
protection of documents against forgery or repudiation. The 
invention also relates to arrangements for the protection of 
electronically transmitted messages against forgery or 
repudiation . 

It is common nowadays to provide security to documents 
through the use of holograms, watermarks, personal signature, 
notary stamps and other physical means: these all increase the 
difficulty for making unauthorised imitations or changes; 
however, they all require physical inspection, often involving 
forensic equipment and expertise, in order to detect a 
counterfeit. It is also becoming increasingly necessary to 
provide security for electronically transmitted messages. 

The present invention provides for the security of the 
text of a document or message by cryptographic techniques. 

In accordance with the, present invention, there is 
provided an apparatus which is arranged to process a selected 
part or selected parts of the text of a document or message to 
form a hash, the hash usually being of fewer characters than 
the selected part or parts of the text, the processing 
comprising reprieving numerical values which define the 
respective characters of the selected part or parts of the text 
and making a calculation using the numerical values of the 
successive characters . 

The apparatus may be arranged to receive or create a 
text in electronic form, then process this text to derive the 
hash of the selected part or parts of the text. The apparatus 
may further be arranged to add the hash to the text: 
typically, the apparatus then outputs the text, with the added 
hash, either for printing as a document or for electronic 
transmission. Alternatively the apparatus may be arranged to 
output the text and the hash separately (or store one and 
output the othor) . 

The practical value of the hash is that it is sensitive 
to any change or alteration in the selected part of the text 
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from which it is derived: it is not feasible to make a desired 
alteration to that part of the text whilst preserving the same 
hash value. 

The hash thus forms a cryptographic signature which 
5 makes forgery detectable on the basis of an assessment of the 
content of the text and without the need for any forensic 
examination of the document. 

The hash algorithm is not applied to the whole text, 
only to a selected part, or to selected parts. The or each 
10 part is identified, or sealed, by predetermined characters or 
combinations or characters immediately preceding and 
immediately following it: for example, a series of tilde marks 
(~) may be used. 

Preferably the numerical values of the respective 
15 characters of the selected text are their ASCII values: the 
characters preferably include all keystrokes (including space, 
return etc.); preferably the "alphabet" is restricted to all 
keystrokes having ASCII values in the range 32 to 125 inclusive 
and also including ASCII values for the "return". 
20 Preferably the processing is recursive, in that the 

calculation in respect of each character uses the result of the 
calculation made in respect of at least one previous character. 

Preferably the calculations for the first several (e.g. 
10) characters use successive ones of a set of initial 
25 variables: preferably the calculations for each subsequent 
character uses, instead of an initial variable, the result of 
the calculation in respect of a previous character. 

Preferably each calculation also uses one of a 
predetermined set of prime numbers. Preferably each 

30 calculation uses an interim result to determine which of these 
prime numbers is used to complete the calculation. 

Preferably the processing involves at least a second 
pass over the selected part or parts of the text: in other 
words, once the calculation for the last character is 
35 completed, a second series of successive calculations is 
carried out on the characters, typically starting with the 
first character, and using the results of the calculations of 
the first series. 

At the end of the above-described processing, the hash 
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is formed by taking selected digits from the results obtained 
in a final plurality of the calculations: for example the 
final two digits may be taken from each of the final 10 
results, and a 20-digit hash formed by placing these 10 pairs 
5 of digits in a given order. 

One form of hash algorithm used in the invention is an 
Objective Linguistic Hash (OLH) . This is linguistic in that 
it "reads" letters, numbers and other keys commonly used in the 
preparation of documents. It is objective in that the hash 
10 value produced can be verified by anyone using the algorithm. 
The OLH algorithm produces a final number by acting recursively 
one character at a time throughout the length of the message. 

The variability of the message far exceeds the 
variability of the final hash, so inevitably many different 
15 messages would have the same hash value. However, it is 
unfeasible to make a meaningful change to the message whilst 
retaining the same hash number. 

It will be appreciated that the invention may be 
incorporated in a word processing apparatus. In this use, a 

2 0 document is created in electronic form on the apparatus, 

complete with the seal (e.g. series of tilde marks) at the 
beginning and end of the or each selected part of the text. 
A "sealing" command is then performed, whereupon the apparatus 
automatically processes the "sealed" part or parts of the text 
25 to create the hash, which is stored with the text. 
Subsequently, the document can be altered or corrected as 
necessary, then "re-sealed", to process the sealed part or 
parts of the text again and create the hash afresh. Once the 
document is finalised, it can be printed out, complete with the 

3 0 hash. 

The above-mentioned OLH algorithm may be modified to 
provide a Subjective Linguistic Hash (SLH) . This differs from 
the OLH in that it is made subjective by being "seeded" with 
secret information known only to an accredited authority: 
35 thus, the processing of the selected or "sealed" part or parts 
of the text is carried out using secret initial variables. 
Preferably use is made of a seed, in the form of a very large 
secret number (typically having 50 to 200 digits) known as the 
Secret Primitive (SP) . An algorithm is run, using the. SP, to 
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produce the initial variables: preferably this algorithm also 
uses a number of items of open information, known as Open 
Primitives (OP's), contained in the document or message being 
protected. The SLH algorithm may produce a plain hash 
5 initially, then encrypt this using the SP as secret key: this 
preserves the secrecy of the plain hash, 

A further algorithm which can be used in accordance 
with the invention is a Subjective Encrypted Hash (SEH) 
algorithm- This involves encrypting an OLH hash, using secret 
10 primitive values known only to a witnessing party, together 

1 with open primitive values such as date and time. In this 
case, the witnessing party uses an apparatus into which the OLH 
of a document or message is keyed, together with the open 
primitive values, and which encrypts the OLH using the SEH 

15 algorithm, to create the SEH hash which is preferably printed 
on the document, or on a label for application to the document. 
Preferably the apparatus stores the initial OLH and the final 
SEH, together with the open primitive values. 

Embodiments of the present invention will now be 
20 described by way of examples only and with reference to the 
accompanying drawings, in which: 

FIGURE 1 is a schematic diagram of an apparatus in 
accordance with the invention; 

FIGURE 2 sets out an example of a document text to be 

2 5 processed; 

FIGURE 3 gives an example of a set of initial variables 
to be used in the processing algorithm; 

FIGURE 4 is a table detailing the successive steps in 
applying the processing algorithm to the document text of 

3 0 Figure 1; 

FIGURE 5 is a table detailing the successive steps in 
applying the processing algorithm in a second pass to the 
document text; and 

FIGURE 6 sets out the final 20-digit hash which is 

3 5 created. 

Referring to Figure 1, an apparatus in accordance with 
the present invention comprises a microprocessor 10 having 
connected to it a read-only memory (ROM) 12, a memory 14 for 
predetermined values, and a data store 16. The ROM 12 holds 
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a hash algorithm and a memory 14 holds a set of initial 
variables and additionally a set of 64 prime numbers each of 
5 digits) and also three prime numbers (preferably the prime 
numbers 37,17 and 7). The apparatus has an input port 18 
5 coupled via a buffer 20 to the data store 16. The message or 
text to be processed may be received in electronic form on the 
input 18, or it may be already stored in the data store 16. 
The microprocessor processes the message or text in accordance 
with the algorithm held in the ROM 12 and using the 
10 predetermined values held in the memory 14, in the manner which 
\ will be described below: the partial results of the hash 
calculation are written to and read from a further store 22. 
Finally, the calculated hash is added to the electronic text 
in the data store 16. The apparatus has a data output port 24, 
15 through which the message text complete with its hash can be 
sent from the data store 16, whether to a printer 26 or a 
transmission modem 28 or other computer peripheral device. 

Figures 2 to 6 provide a worked example which uses an 
OLH algorithm on a selected part of the text (Figure 2) of a 
20 document, typically a word processed document, namely the part 
between the two series of five tilde marks ( ). The hash 

algorithm uses a set of initial values or variables (IV 1 s), in 
this example 2,4,8,16,32,64,128,256,512 and 1024 (Figure 3): 
the algorithm additionally uses a set of 64 prime numbers (each 
25 of 5 digits) used as modulators and also three prime numbers, 
preferably 37,17 and 7, as will be shown below; the OLH 
algorithm is stored in the ROM 12 of Figure 1 and the initial 
variables and the prime numbers just referred to are stored in 
memory 14. The tables of Figures 4 and 5 show the processing 
30 carried out to create a 20-digit hash. The following 
description, referring firstly to Figure 4, shows the manner 
in which the calculations proceed, taking for example the 16th 
row. It will be noted that the part of the message of Figure 
2 to be processed is set out character-by-character in the 
35 first column of the table of Figure 4: the rows are numbered 
0 to 9 cyclically (starting with 1) in the second column; the 
initial variables of Figure 3 are used in turn in the 5th 
column, for the first 10 rows. 

Thus, reading across the 16th row in the table of 
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15 



"y" value, i.e. the choice of 



16 

5405846 
37 



27299 



Figure 4, we have: 

s = the input character 

6 = the 

recursive P(y) 
5 115 = the ASCII value of "s" 

25149 = the value of the result on the 

preceding row, namely P(5) = 25149 
16002 = the last value of P(y) , i.e. P(6) 

41266 = 115 + 25149 + 16002 (the sum of the 

10 values in the three preceding columns 

in the same row) 
= the value of n, where n is the 

ordinal of the character in the text 
41266 x (115 + 16) 
= the value of Z 

=(37x115+17x16+7x25149+30539) mod 64 
(using the prime numbers 37,17,7) 
= the value of the 37th of the set of 

64 5-digit prime numbers 
20 644 = 5405846 mod 27299 

It will be noted that the calculation on each row in 
the table of Figure 4 is recursive, in that it uses results 
produced on previous rows (see the 4th and 5th items in each 
row) . Further, in the example shown, the algorithm makes a 
25 second pass over the sealed part of the text: the successive 
calculations of this are set out in the table of Figure 5. 
Finally, the 20-digit OLH hash is produced by selecting the 
final two digits of the results (final column) of the final 10 
rows, placed in the order of recursive p(y) = 0 to 9: this 
30 hash is set out in Figure 6. 

Any attempt to alter the sealed part of the text, 
whilst retaining the same hash value, would require subsequent 
alterations in all further recursive steps to the end of that 
text. This is inherently difficult, but made more so by 
35 continuing the recursion back to the beginning of the sealed 
text for the second pass: a third pass may additionally be 
made . 

The above-described OLH algorithm may be modified to 
form a Subjective Linguistic Hash (SLH) . The SLH differs from 
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the OLH in that it is made subjective by being "seeded" with 
secret information known only to an accredited authority: the 
initial values (IV 1 s) are therefore secret* Preferably the 
seed is a very large secret number, typically with 50 to 200 
5 digits, and known as the Secret Primitive (SP) , known only to 
the issuing authority. The SLH algorithm "fuses'? the SP with 
open information, known as Open Primitives (OP's), contained 
in the document or message, to produce the initial variables 
(IV's) . Preferably the algorithm produces a "plain hash" in 
10 the first instance, which is then doubly encrypted using the 
s SP as secret key. This preserves the secrecy of the plain hash 
and makes it mathematically unfeasible to work backwards 
through the document to discover the primitives. 

A further algorithm which can be used is a Subjective 
15 Encrypted Hash (SEH) . The SEH involves encrypting an OLH hash. 
The encryption incorporates secret primitive (SP) values known 
only to a witnessing party, and open primitive (OP) values such 
as date and tir.ie or other non-repeating factors. Further, the 
encryption is one-way, because the OLH is also fused with the 
20 OP and SP values. Since the key is therefore part of the 
message, the crypt cannot be reversed by application of the 
key. Every output value of a fixed OLH is therefore distinct, 
due to non-repeating elements in the OP's. 

A number of possible applications of the invention will 
25 now be described by way of examples only. 

A first application of the invention is for preventing 
fraudulent alteration of a Vehicle Registration Document. It 
is well known that stolen or redundant Vehicle Registration 
Documents have a value in the process of "ringing", that is, 
30 altering the identity of a stolen car. To complete the fraud, 
a plausible Vehicle Registration Document is required. In a 
first case, the ringer will have to make a forged alteration 
to the document, for example, to cover a re-spray in a 
different colour. In a second case, if the ringer can alter 
3 5 the identity of a car, exactly to match the Vehicle 
Registration Document, then the fraud is undetectable to an 
unsuspecting buyer . 

The present invention can prevent fraud in either case 
in the following way. When a vehicle is insured, the important 
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fixed elements of the particular information concerning the 
vehicle and its keeper form the message parts of the hash 
algorithm. The secret primitives are in the possession of the 
insurer of the vehicle. 
5 Example of message parts are: 

Owner, Keeper, Registration Number, Make, 
Model, Colour, Chassis Number, Engine number. 
These parts are impossible to alter in a fraudulent 
way, without knowledge of the corresponding altered value of 
10 the SLH. Thus an SLH hash marked on the Registration Document 
protects against the first case of fraud. To solve the problem 
of the second case, OP 1 s are added as follows: 

Insurance Renewal Date, Mileage on last 
insurance, Stated value on last insurance. 
15 These OP 1 s have to be altered in the second case to 

give a vehicle a new false history. It is not possible for a 
ringer to do this because the true history is protected by 
earlier SLH's. 

The SP for a given Insurance Company or other authority 
20 would preferably be a very large number, typically of 50 to 200 
digits. It is preferably that the insurance company produces 
an updated SLH each year, using details of the vehicle and its 
keeper held or added to it's stored record for that client, and 
including the vehicle mileage: the SLH may then be printed. 
25 In a variation applicable to a vehicle registration 

document, the insurance company may produce an SLH each year, 
using details of the vehicle and its keeper, including the 
vehicle mileage: the SLH is then printed on a sticker, 
together with open information of the vehicle (e.g. mileage, 
30 value of the vehicle) for the keeper to stick on the vehicle 
registration document. Each time the insurance is renewed, an 
additional such sticker is created for the keeper to add to the 
registration document. It will be appreciated that the 
registration document will thus include, in respect of each 
35 renewal, a hash related to data printed in selected parts or 
fields of the document. 

A second application of the invention is relevant to 
high value tickets, bought in advance where there is high risk 
of fraud. This form of fraud is rife for example in the sale 
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of tickets for long-awaited pop concerts where the forged 
tickets are sold to young people in a social context where they 
are likely to be susceptible. Nothing can prevent a buyer from 
purchasing a ticket where there is no ready means of verifying 
its data, but with a suitable warning this application of the 
invention exerts psychological pressure due to the uncertainty 
that a ticket bought from an unofficial source will be valid 
on the day of the concert, A suitable warning might read as 
follows : 

Warning: If you have bought 

this ticket from an 

unauthorised source, it may be 

a perfect forgery. Only 

genuine tickets will pass the 

electronic test at the 

turnstile. Do not run the risk 

of being turned away. 
Each event is given an SP which is available as an 
input to the software used at legitimate outlets. This SP is 
only released to points of entry to the concert immediately 
before the crowds start to appear. The point of entry has a 
machine for reading the hash from the ticket: the hash may be 
printed on the ticket, at the time of issue, in both human- 
readable and machine-readable form. The OP is a combination 
of the date and time of sale, correct to the nearest second, 
and the name of the buyer. The SLH is also printed on the 
ticket. Even if the fraudster prints a very recent time and 
date, it is mathematically unfeasible to calculate the 
appropriate SLH, so he has a hazardous task of persuading the 
buyer that he/she must attach no significance to the lapse of 
time. Further, the buyer who reads the warning on the reverse 
side of the ticket is put under the psychological pressure of 
having to wait for the concert itself before knowing whether 
the ticket is valid. 

A third application of the invention is relevant to 
National Identity Cards which display a photograph and personal 
details of the legitimate owner. The invention provides for 
a massive SP (containing at least 400 figures) held in a tamper 
proof location. The printed matter of the card is classified 
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either as message parts to be hashed or OP ' s . The SLH is 
printed on the face of the card as additional information. 
This prevents alteration of a card or the printing of a false 
identity. 

5 A fourth application of the invention is the use of a 

Trusted Third Party such as an accredited Notary Public to 
provide an SEH supplied with a pre-calculated OLH for a 
"sealed" part of a document. The document itself may either 
be sent in plain or in crypt. The function of the notary is 
10 to use the OLH to calculate the SEH. The document may be 
processed to provide it with a double header, the OLH and the 
SEH which incorporates a date/time stamp. In the event of a 
dispute both "versions" of the disputed text can be tested by 
an OLH, but only the valid OLH will have the proper SEH. 
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Claims 

1) An apparatus which is arranged to process a selected 

part or parts of the text of a document or message to form a 
hash, the hash usually being of fewer characters than the 
5 selected part or parts of the text, the processing comprising 
retrieving numerical values which define the respective 
characters of the selected part or parts of the text and making 
a calculation using the numerical values of the successive 
characters. 

10 2) An apparatus as claimed in claim 1, which is arranged 

to receive or create said text in electronic form, then process 
said text to derive said hash. 

3) An apparatus as claimed in claim 2, arranged to add 

said hash to said text. 

15 4) An apparatus as claimed in claim 3, arranged to output 

said text, with the added hash. 

5) An apparatus as claimed in claim 2, arranged to output 

said text and its hash separately, or to store one and output 
the other. 

20 6) An apparatus as claimed in any preceding claim, 

arranged for the or each said part of said text to be 
identified by predetermined characters or combinations of 
characters immediately preceding and following it. 

7) An apparatus as claimed in claim 6, in which each said 
25 identifier comprises a series of tilde marks. 

8) An appairatus as claimed in any preceding claim, in 
which said numerical values of the respective characters of the 
selected text are their ASCII values. 

9) An apparatus as claimed in claim 8, in which an 
30 alphabet which includes all said characters is restricted to 
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all keystrokes having ASCII values in the range 32 to 125 
inclusive . 

10) An apparatus as claimed in any preceding claim, 
arranged so that said processing of said selected part or parts 
of said text comprises recursive processing, in that the 
calculation in respect of each character uses the result of the 
calculation made in respect of at least one previous character. 

11) An apparatus as claimed in claim 10, arranged so that 
the calculations made for a first plurality of characters use 
successive ones of a set of initial variables. 

12) An apparatus as claimed in claim 10 or 11, arranged so 
that each said calculation also uses one of a predetermined set 
of prime numbers, 

13) An appairatus as claimed in claim 12, arranged such that 
each said calculation uses an interim result to determine which 
of said prime numbers is used to continue the calculation. 

14) An apparatus as claimed in any preceding claim, 
arranged so that said processing involves at least a second 
pass over the selected part or parts of said text. 

15) An apparatus as claimed in any preceding claim, 
arranged so that at the end of said processing, the hash is 
formed by taking selected digits from the results obtained in 
a final plurality of said calculation. 

16) An apparatus as claimed in any preceding claim, 
arranged such that said hash is seeded with secret information. 

17) An apparatus as claimed in claim 16, arranged such that 
said processing is carried out using secret initial variables. 

18) An apparatus as claimed in any preceding claim, 
arranged to encrypt said hash. 
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19) An apparatus as claimed in claim 18, arranged to store 
said hash and the encrypted hash formed from it. 

20) An article carrying information in the form of printed 
or electronic text and also carrying a hash formed from a 
selected part of parts of said text, the hash usually being of 
fewer characters than the selected part or parts of said text 
and formed by making a calculation using numerical values which 
define the respective characters of said selected part or parts 
of said text. 

21) A process of forming a hash from a selected part of 
parts of the text of a document or message, the process 
comprising retrieving numerical values which define the 
respective characters of the selected part or parts of said 
text and making a calculation using the numerical values of the 
successive characters, said hash usually being of fewer 
characters than said selected part or parts of the text. 
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I shall assume that the following is agreeable unless 



I hear otherwise: 



!•• •Royalties •shall«be»paid»at* 2*1/ 2%. | 
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